Depth-dependent microbial metagenomes sampled in the northeastern Indian Ocean

The northeastern Indian Ocean exhibits distinct hydrographic characteristics influenced by various local and remote forces. Variations in these driving factors may alter the physiochemical properties of seawater, such as dissolved oxygen levels, and affect the diversity and function of microbial communities. How the microbial communities change across water depths spanning a dissolved oxygen gradient has not been well understood. Here we employed both 16S rDNA amplicon and metagenomic sequencing approaches to study the microbial communities collected from different water depths along the E87 transect in the northeastern Indian Ocean. Samples were collected from the surface, Deep Chlorophyll Maximum (DCM), Oxygen Minimum Zone (OMZ), and bathypelagic layers. Proteobacteria were prevalent throughout the water columns, while Thermoproteota were found to be abundant in the aphotic layers. A total of 675 non-redundant metagenome-assembled genomes (MAGs) were constructed, spanning 21 bacterial and 5 archaeal phyla. The community structure and genomic information provided by this dataset offer valuable resources for the analysis of microbial biogeography and metabolism in the northeastern Indian Ocean.


Background & Summary
The Indian Ocean is bordered by the Southern Ocean to the south and enclosed by continental shelf and land masses on other sides, covering approximately 20% of the global surface ocean.The hydrographic characteristics of the Indian Ocean are influenced by a multitude of geological and physicochemical processes, including tectonic activities 1 , oceanic circulation patterns 2 , boundary currents 3 , climate modes 4 , and land-ocean interactions 5 , etc. Three distinct biomes have been proposed based on the biogeochemical characteristics of the Indian Ocean 6 , including the oligotrophic subtropical southern Indian Ocean, the iron-deficient low-productivity equatorial region, and the nutrient-rich high-productivity northern Indian Ocean [6][7][8] .The northern Indian Ocean, particularly the Bay of Bengal (BoB), also receives significant freshwater discharge 9,10 and atmospheric deposition 11 , which increase surface productivity and strengthen stratification.In conjunction with the limited oxygen supply from deep overturning circulation and lateral advection in the northern Indian Ocean 2 , the mid-depth waters ranging approximately from 200 to 1000 m are oxygen deficient to a vast extent, forming two large oxygen minimum zones (OMZs) in the Arabian Sea and the Bay of Bengal 12 .Collectively, the areas covered by OMZs in the Indian Ocean account for more than half of the global OMZs (59%) 13 , with the Bay of Bengal standing as the world's largest hypoxic bay 14 .Despite significant seasonal variations in monsoon winds and biological productivity, dissolved oxygen concentrations in these regions exhibit relatively minor fluctuations 15 .
Marine microorganisms play a central role in driving various elemental cycles within the global ocean due to their high abundance, immense diversity, and versatile metabolic capacity 16,17 .The Indian Ocean has a great influence on global biogeochemical cycles by contributing around 15% of oceanic net primary production 18 , with a particularly higher abundance of picocyanobacteria than most other oceanic basins 19 .Dissolved oxygen is one of the most important factors controlling microbial respiration and biogeochemical transformation in marine environments 20 .In oxygen-deficient waters, alternative electron acceptors such as nitrate were used or preferred by diverse marine organisms 21,22 .OMZs are characterized by significant REDOX gradients, and the nitrogen cycle dominates the biogeochemical processes 23,24 .The continuous expansion of marine OMZs will be accompanied by more widespread anammox and denitrification activities, which will have a profound influence on nitrogen bioavailability in marine environments 25 .To better understand the role of biological communities within OMZs, it is important to study their diversity, metabolic function, and ecological relationships 26,27 .
In this study, we conducted a comprehensive sampling expedition from April 15 th to June 20 th , 2020, along the E87 transect in the Northeast Indian Ocean, spanning from 10°S off the East India coast to 15°N in the BoB.A total of 25 water samples were collected from various depths, including the surface (5 m, n = 7), DCM (n = 7), OMZ (n = 6), bathypelagic (Bathy) layers (2000 m, n = 5), for studying microbial diversity and metabolic potentials (Fig. 1).Detailed sample metadata including geographic locations and environmental factors can be found in Table S1.Flow cytometry analysis showed that the abundance of Prochlorococcus and picoeukaryotes reached their maxima in the DCM layer.In contrast, a higher abundance of Synechococcus was observed near the surface (Table S1).The 16S rDNA amplicon data revealed that Proteobacteria constituted the dominant phylum, accounting for 49.31% of all reads.Within Proteobacteria, Alphaproteobacteria accounted for 59.72%, while Gammaproteobacteria represented 29.44%.Notably, Gammaproteobacteria dominated in both the OMZ and Bathy waters.Cyanobacteria, on the other hand, were primarily distributed in the DCM and higher layers, accounting for 12.46% of all reads.Thermoproteota (Marine Group I archaea, MGI) emerged as a significant component of the OMZ layer, accounting for 8.77% of all reads.MGII (Marine Group II archaea) was predominantly found in the DCM, and although the relative abundance of MGIII (Marine Group III archaea) was relatively low across the water column, it was significantly higher in the OMZ layer compared to other layers (Fig. 2 and Table S2).
Complementary to the MAG-based analysis, genes were called on the contig level to construct a community-level gene catalog.After gene calling and deduplication, a total of 9,908,058 unique genes were recovered and function annotated with KEGG Orthology (KO) groups.The relative abundance of each unique Fig. 1 Sampling sites and layers along the E87 transect in the northeastern Indian Ocean.Surface, the surface layer at 5 m.DCM, the Deep Chlorophyll Maximum layer.OMZ, the oxygen minimum zone layer.Bathy, the bathypelagic layer at 2000 m.Detailed sample metadata can be found in Table S1.
Fig. 2 The relative abundance of different taxa across depths based on 16S rDNA amplicon sequencing in the northeastern Indian Ocean.Amplicon sequences were denoised and grouped into Amplicon Sequence Variants (ASVs) to calculate microbial relative abundance in each sample.Detailed 16S rDNA taxonomy assignment can be found in Table S2.
Fig. 3 The phylogenomic tree of 571 bacterial MAGs reconstructed from the northeastern Indian Ocean.The universally conserved 160 single-copy marker genes were used to build this maximum-likelihood phylogenomic tree with 1000 bootstraps.Detailed MAG taxonomy assignment, associated with completeness and contamination information can be found in Table S2.
gene in each sample was calculated in RPKM values.Gene sequences and a table of gene abundance across samples with functional annotations were provided (see the "Data records" section).

Materials and Methods
Sample collection and preparation.Samples were collected from the Northeast Indian Ocean, spanning latitude 10°S to 15°N along longitude 87°E, during the R/V "Shiyan3" cruise from April 15 to June 20, 2020 (Fig. 1).A total of 25 seawater samples were collected from 9 distant sites, covering both surface waters and deeper ocean regions.Fifteen liters of seawater were pre-filtered using a 20 μm nylon mesh (Sefar Nitex, Sweden), followed by subsequent filtration through a 0.22 μm pore size polycarbonate filter (Millipore, MA, USA).The filters were frozen in liquid nitrogen onboard and kept at −20 °C until DNA extraction.For microbial abundance estimation, 2 mL seawater samples were first filtered through a 20 μm nylon mesh, then fixed with 1% (vol/vol) glutaraldehyde, incubated in the dark for 15 minutes, and promptly frozen in liquid nitrogen and preserved at −20 °C for subsequent analysis.In-situ measurements of water temperature, salinity, dissolved oxygen (DO), and fluorescence were conducted using conductivity-temperature-depth (CTD) oceanic profilers (SBE-911 Plus).Other chemical parameters, including nitrite nitrogen, nitrate nitrogen, phosphate, and silicate concentrations were assessed using the Technicon AA3 Auto-Analyzer (Bran-Luebbe, Germany) 33 .Samples were named following the pattern of "station_name-water_depth".For instance, the sample name "S10-1-5" indicates this sample was taken at station "S10-1" at a depth of "5" meters.
DNA extraction and sequencing.The phenol-chloroform-isoamyl alcohol method was applied to extract microbial DNA, as described previously 34 .The quality and concentrations of DNA were quantified using 1% agarose gel electrophoresis and Invitrogen Qubit 2.0 Fluorimeter (ThermoFisher Scientific), respectively.The V4-V5 hypervariable regions of the 16S rRNA gene sequences were amplified using a universal primer pair, 515Y (5′-GTGYCAGCMGCCGCGGTAA-3′) and 926 R (5′-CCGYCAATTYMTTTRAGTTT-3′) 35 .The amplified fragments were sequenced on the Illumina HiSeq 2500 platform using paired-end 2 × 250 bp chemistries as described previously 36 .To ensure data quality, raw reads of 16S rDNA gene sequencing were subjected to adapter trimming and quality control using the cutadapt v4.0 and the fastqc v0.12.1 plugins wrapped in the QIIME2 toolkit suite (version 2022.2) 37 .Amplicon sequence variants (ASVs) and a feature table were generated using the deblur v1.1.1 plugin in QIIME2 38 .The taxonomy of representative ASV sequences was then assigned using the QIIME2 feature-classifier plugin with the pre-trained 99% clustered SILVA database (release 138) as the employed sklearn classifier (Fig. S1).
Qualified DNA samples were fragmented using the Covaris Ultrasonicator M220 (Covaris, USA) with a fragment size of ~500 bp.The resulting DNA fragments were subsequently used in the library preparation and Fig. 4 The phylogenomic tree of 104 archaeal MAGs reconstructed from the northeastern Indian Ocean.The universally conserved 49 single-copy marker genes were used to build this maximum-likelihood phylogenomic tree with 1000 bootstraps.Detailed MAG taxonomy assignment, associated with completeness and contamination information can be found in Table S2.
sequencing on an Illumina HiSeq 2500 platform using paired-end 2 × 150 bp chemistries for metagenomic sequencing.All the sequencing jobs were carried out at MAGIGENE (Magigene Biotech, Guangzhou, China).

Phylogenomic tree construction.
The 160 and 49 conserved bacterial and archaeal single-copy genes were extracted from these MAGs using GTDB-Tk v2.3.2 53 , respectively.Only marker genes found in ≥30 MAGs were eventually selected to construct the bacterial and archaeal phylogenomic trees.MUSCLE v5 55 was used to align marker gene sequences extracted from MAGs, and then BMGE 56 was used to prune the alignments.Phylogenomic trees were constructed using IQTree v2.0.3 57 with the optimal models (Bacteria: -m Q.pfam + F + I -B 1000, Archaea: -m LG + F + R5 -B 1000) estimated by ModelFinder 58 .The confidence of the maximum-likelihood tree was estimated using 1000 bootstraps.